IBM Parallel Sysplex
   HOME

TheInfoList



OR:

In computing, a Parallel Sysplex is a
cluster may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Asteroid cluster, a small asteroid family * Cluster II (spacecraft), a European Space Agency mission to study t ...
of
IBM mainframe IBM mainframes are large computer systems produced by IBM since 1952. During the 1960s and 1970s, IBM dominated the large computer market. Current mainframe computers in IBM's line of business computers are developments of the basic design of th ...
s acting together as a
single system image In distributed computing, a single system image (SSI) cluster is a cluster of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented ...
with
z/OS z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS versions.Starting with the earliest: * O ...
. Used for disaster recovery, Parallel Sysplex combines data sharing and
parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
to allow a cluster of up to 32 systems to share a workload for high performance and
high availability High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. Modernization has resulted in an increased reliance on these systems. Fo ...
.


Sysplex

In 1990, IBM
mainframe computer A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
s introduced the concept of a Systems Complex, commonly called a Sysplex, with
MVS Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated ...
/ESA SPV4.1. This allows authorized components in up to eight
logical partition A logical partition (LPAR) is a subset of a computer's hardware resources, virtualized as a separate computer. In effect, a physical machine can be partitioned into multiple logical partitions, each hosting a separate instance of an operating ...
s (LPARs) to communicate and cooperate with each other using the XCF protocol. Components of a Sysplex include: * A common time source to synchronize all member systems' clocks. This can involve either a Sysplex timer (Model 9037), or the Server Time Protocol (STP) *
Global Resource Serialization Global Resource Serialization (GRS) is the component within the IBM z/OS operating system responsible for enabling fair access to serially reusable computing resources, such as datasets and tape drives or virtual resources, such as lists, queues, ...
(GRS), which allows multiple systems to access the same resources concurrently, serializing where necessary to ensure exclusive access * Cross System Coupling Facility ( XCF), which allows systems to communicate
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
* Couple Data Sets (CDS) Users of a (base) Sysplex include: * Console services – allowing one to merge multiple MCS consoles from the different members of the Sysplex, providing a single system image for Operations * Automatic Restart Manager (ARM) – Policy to direct automatic restart of failed jobs or started tasks on the same system if it is available or on another LPAR in the Sysplex * Sysplex Failure Manager (SFM) – Policy that specifies automated actions to take when certain failures occur such as loss of a member of a Sysplex or when reconfiguring systems *
Workload Manager In IBM mainframes, Workload Manager (WLM) is a base component of MVS/ESA mainframe operating system, and its successors up to and including z/OS. It controls the access to system resources for the work executing on z/OS based on administrator-defi ...
(WLM) – Policy based performance management of heterogeneous workloads across one or more z/OS images or even on AIX *
Global Resource Serialization Global Resource Serialization (GRS) is the component within the IBM z/OS operating system responsible for enabling fair access to serially reusable computing resources, such as datasets and tape drives or virtual resources, such as lists, queues, ...
(GRS) - Communication – allows use of XCF links instead of dedicated channels for GRS, and Dynamic RNLs * Tivoli OPC – Hot standby support for the controller *
RACF Introduction RACF, ronounced Rack-Effshort for Resource Access Control Facility, is an IBM software product. It is a security system that provides access control and auditing functionality for the z/OS and z/VM operating systems. RACF was in ...
(IBM's mainframe security software product) – Sysplex-wide RVARY and SETROPTS commands * PDSE file sharing * Multisystem VLFNOTE, SDUMP, SLIP, DAE * Resource Measurement Facility (RMF) – Sysplex-wide reporting *
CICS IBM CICS (Customer Information Control System) is a family of mixed-language application servers that provide online transaction management and connectivity for applications on IBM mainframe systems under z/OS and z/VSE. CICS family products ...
– uses XCF to provide better performance and response time than using VTAM for transaction routing and function shipping. * zFS – Using XCF communication to access data across multiple LPARs


Parallel Sysplex

IBM introduced the Parallel Sysplex with the addition of the 9674
Coupling Facility In IBM System/390 and IBM Z mainframe computers, a Coupling Facility or CF is a piece of computer hardware or virtual machine that coordinates multiple processors. A Parallel Sysplex relies on one or more Coupling Facilities (CFs). A coupling f ...
(CF), new S/390 models, upgrades to existing models, coupling links for high speed communication and MVS/ESA SP V5.1 operating system support, in April 1994. The Coupling Facility (CF) may reside on a dedicated stand-alone server configured with processors that can run Coupling Facility control code (CFCC), as integral processors on the mainframes themselves configured as ICFs (Internal Coupling Facilities), or less common, as normal LPARs. The CF contains Lock, List, and Cache structures to help with serialization, message passing, and buffer consistency between multiple LPARs. The primary goal of a Parallel Sysplex is to provide data sharing capabilities, allowing multiple databases for direct reads and writes to shared data. This can provide benefits of * Help remove single points of failure within the server, LPAR, or subsystems * Application Availability * Single System Image * Dynamic Session Balancing * Dynamic Transaction Routing * Scalable capacity Databases running on the System z server that can take advantage of this include: *
IBM Db2 Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON a ...
*
IBM Information Management System The IBM Information Management System (IMS) is a joint hierarchical database and information management system that supports transaction processing. History IBM designed the IMS with Rockwell and Caterpillar starting in 1966 for the Apollo pr ...
(IMS). *
VSAM Virtual Storage Access Method (VSAM) is an IBM DASD file storage access method, first used in the OS/VS1, OS/VS2 Release 1 (SVS) and Release 2 (MVS) operating systems, later used throughout the Multiple Virtual Storage (MVS) architecture and no ...
(VSARM/RLS) * IDMS * Adabas * DataCom * Oracle Other components can use the Coupling Facility to help with system management, performance, or reduced hardware requirements. Called “Resource Sharing”, uses include: * Catalog – shared catalogs to improve performance by reducing I/O to a catalog data set on disk * CICS – Using the CF to provide sharing and recovery capabilities for named counters, data tables, or transient data * DFSMShsm – Workload balancing for data migration workload * GRS Star – Reduced CPU and response time performance for data set allocation. Tape Switching uses the GRS structure to provide sharing of tape units between z/OS images. * Dynamic CHPID Management (DCM), and I/O priority management * JES2 Checkpoint – Provides improved access to a multisystem checkpoint * Operlog / Logrec – Merged multisystem logs for system management * RACF – shared data set to simplify security management across the Parallel Sysplex * WebSphere MQ – Shared message queues for availability and flexibility * WLM - provides support for Intelligent Resource Director (IRD) to extends the z/OS Workload Manager to help manage CPU and I/O resources across multiple LPARs within the Parallel Sysplex. Functions include LPAR CPU management, IRD. Multi-system enclave management for improved performance * XCF Star – Reduced hardware requirements and simplified management of XCF communication paths Major components of a Parallel Sysplex include: *
Coupling Facility In IBM System/390 and IBM Z mainframe computers, a Coupling Facility or CF is a piece of computer hardware or virtual machine that coordinates multiple processors. A Parallel Sysplex relies on one or more Coupling Facilities (CFs). A coupling f ...
(CF or ICF) hardware, allowing multiple processors to share, cache, update, and balance data access; * Sysplex Timers (more recently, Server Time Protocol) to synchronize the clocks of all member systems; * High speed, high quality, redundant cabling; * Software (
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
services and, usually,
middleware Middleware is a type of computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue". Middleware makes it easier for software developers to implement co ...
such as
IBM Db2 Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON a ...
). The Coupling Facility may be either a dedicated external system (a small mainframe, such as a
System z9 IBM System z9 is a line of IBM mainframe computers. The first models were available on September 16, 2005. The System z9 also marks the end of the previously used eServer zSeries naming convention. It was also the last mainframe computer ...
BC, specially configured with only coupling facility processors) or integral processors on the mainframes themselves configured as ICFs (Internal Coupling Facilities). It is recommended that at least one external CF be used in a parallel sysplex. It is recommended that a Parallel Sysplex has at least two CFs and/or ICFs for redundancy, especially in a production data sharing environment. Server Time Protocol (STP) replaced the Sysplex Timers beginning in 2005 for System z mainframe models z990 and newer. A Sysplex Timer is a physically separate piece of hardware from the mainframe, whereas STP is an integral facility within the mainframe's microcode. With STP and ICFs it is possible to construct a complete Parallel Sysplex installation with two connected mainframes. Moreover, a single mainframe can contain the internal equivalent of a complete physical Parallel Sysplex, useful for application testing and development purposes. The IBM Systems Journal dedicated a full issue to all the technology components.


Server Time Protocol

Maintaining accurate time is important in computer systems. For example, in a transaction-processing system the recovery process reconstructs the transaction data from log files. If time stamps are used for transaction-data logging, and the time stamps of two related transactions are transposed from the actual sequence, then the reconstruction of the transaction database may not match the state before the recovery process. Server Time Protocol (STP) can be used to provide a single time source between multiple servers. Based on Network Time Protocol concepts, one of the System z servers is designated by the HMC as the primary time source (Stratum 1). It then sends timing signals to the Stratum 2 servers through use of coupling links. The Stratum 2 servers in turn send timing signals to the Stratum 3 servers. To provide availability, one of the servers can be designated as a backup time source, and a third server can be designated as an Arbiter to assist the Backup Time Server in determining if it should take the role of the Primary during exception conditions. STP has been available on System z servers since 2005. More information on STP is available in “Server Time Protocol Planning Guide”.


Geographically Dispersed Parallel Sysplex

Geographically Dispersed Parallel Sysplex (GDPS) is an extension of Parallel Sysplex of mainframes located, potentially, in different cities. GDPS includes configurations for single site or multiple site configurations: * GDPS HyperSwap Manager: This is based on synchronous
Peer to Peer Remote Copy Peer to Peer Remote Copy or PPRC is a protocol to replicate a storage volume to another control unit in a remote site. Synchronous PPRC causes each write to the primary volume to be performed to the secondary as well, and the I/O is only considere ...
(PPRC) technology for use within a single data center. Data is copied from the primary storage device to a secondary storage device. In the event of a failure on the primary storage device, the system automatically makes the secondary storage device the primary, usually without disrupting running applications. * GDPS Metro: This is based on synchronous data mirroring technology (PPRC) that can be used on mainframes apart. In a two-system model, both sites can be administered as if they were one system. In the event of a failure of a system or storage device, recovery can occur automatically, with limited or no data loss. * GDPS Global - XRC: This is based on asynchronous
Extended Remote Copy {{Unreferenced, date=December 2009 Extended Remote Copy or XRC is an IBM zSeries and System z9 mainframe computer technology for data replication. It combines supported hardware and z/OS z/OS is a 64-bit operating system for IBM z/Arch ...
(XRC) technology with no restrictions on distance. XRC copies data on storage devices between two sites such that only a few seconds of data may be lost in the event of a failure. If a failure does occur, a user must initiate the recovery process. Once initiated, the process is automatic in recovering from secondary storage devices and reconfiguring systems. * GDPS Global - GM: This is based on asynchronous
IBM Global Mirror Global Mirror is an IBM technology that provides data replication over extended distances between two sites for business continuity and disaster recovery. If adequate bandwidth exists, Global Mirror provides a recovery point objective (RPO) of as ...
technology with no restrictions on distance. It is designed for recovery from a total failure at one site. It will activate secondary storage devices and backup systems. * GDPS Metro Global - GM: This is a configuration for systems with more than two systems/sites, for purposes of disaster recovery. It is based on GDPS Metro together with GDPS Global - GM. * GDPS Metro Global - XRC: This is a configuration for systems with more than two systems/sites for purposes of disaster recovery. It is based on GDPS Metro together with GDPS Global - XRC. * GDPS Continuous Availability: This is a disaster recovery / continuous availability solution, based on two or more sites, separated by unlimited distances, running the same applications and having the same data to provide cross-site workload balancing. IBM Multi-site Workload Lifeline, through its monitoring and workload routing, plays an integral role in the GDPS Continuous Availability solution.


See also

*
System z IBM Z is a family name used by IBM for all of its z/Architecture mainframe computers. In July 2017, with another generation of products, the official family was changed to IBM Z from IBM z Systems; the IBM Z family now includes the newest mode ...
*
LPAR A logical partition (LPAR) is a subset of a computer's hardware resources, virtualized as a separate computer. In effect, a physical machine can be partitioned into multiple logical partitions, each hosting a separate instance of an operating ...


References

{{Reflist


External links


IBM Parallel Sysplex site

IBM GDPS page
IBM mainframe technology High-availability cluster computing Cluster computing Parallel computing